166 research outputs found
Efficient Bayesian estimation of Markov model transition matrices with given stationary distribution
Direct simulation of biomolecular dynamics in thermal equilibrium is
challenging due to the metastable nature of conformation dynamics and the
computational cost of molecular dynamics. Biased or enhanced sampling methods
may improve the convergence of expectation values of equilibrium probabilities
and expectation values of stationary quantities significantly. Unfortunately
the convergence of dynamic observables such as correlation functions or
timescales of conformational transitions relies on direct equilibrium
simulations. Markov state models are well suited to describe both, stationary
properties and properties of slow dynamical processes of a molecular system, in
terms of a transition matrix for a jump process on a suitable discretiza- tion
of continuous conformation space. Here, we introduce statistical estimation
methods that allow a priori knowledge of equilibrium probabilities to be
incorporated into the estimation of dynamical observables. Both, maximum
likelihood methods and an improved Monte Carlo sampling method for reversible
transition ma- trices with fixed stationary distribution are given. The
sampling approach is applied to a toy example as well as to simulations of the
MR121-GSGS-W peptide, and is demonstrated to converge much more rapidly than a
previous approach in [F. Noe, J. Chem. Phys. 128, 244103 (2008)]Comment: 15 pages, 8 figure
Spectral rate theory for projected two-state kinetics
Classical rate theories often fail in cases where the observable(s) or order
parameter(s) used are poor reaction coordinates or the observed signal is
deteriorated by noise, such that no clear separation between reactants and
products is possible. Here, we present a general spectral two-state rate theory
for ergodic dynamical systems in thermal equilibrium that explicitly takes into
account how the system is observed. The theory allows the systematic estimation
errors made by standard rate theories to be understood and quantified. We also
elucidate the connection of spectral rate theory with the popular Markov state
modeling (MSM) approach for molecular simulation studies. An optimal rate
estimator is formulated that gives robust and unbiased results even for poor
reaction coordinates and can be applied to both computer simulations and
single-molecule experiments. No definition of a dividing surface is required.
Another result of the theory is a model-free definition of the reaction
coordinate quality (RCQ). The RCQ can be bounded from below by the directly
computable observation quality (OQ), thus providing a measure allowing the RCQ
to be optimized by tuning the experimental setup. Additionally, the respective
partial probability distributions can be obtained for the reactant and product
states along the observed order parameter, even when these strongly overlap.
The effects of both filtering (averaging) and uncorrelated noise are also
examined. The approach is demonstrated on numerical examples and experimental
single-molecule force probe data of the p5ab RNA hairpin and the apo-myoglobin
protein at low pH, here focusing on the case of two-state kinetics
Projected and Hidden Markov Models for calculating kinetics and metastable states of complex molecules
Markov state models (MSMs) have been successful in computing metastable
states, slow relaxation timescales and associated structural changes, and
stationary or kinetic experimental observables of complex molecules from large
amounts of molecular dynamics simulation data. However, MSMs approximate the
true dynamics by assuming a Markov chain on a clusters discretization of the
state space. This approximation is difficult to make for high-dimensional
biomolecular systems, and the quality and reproducibility of MSMs has therefore
been limited. Here, we discard the assumption that dynamics are Markovian on
the discrete clusters. Instead, we only assume that the full phase- space
molecular dynamics is Markovian, and a projection of this full dynamics is
observed on the discrete states, leading to the concept of Projected Markov
Models (PMMs). Robust estimation methods for PMMs are not yet available, but we
derive a practically feasible approximation via Hidden Markov Models (HMMs). It
is shown how various molecular observables of interest that are often computed
from MSMs can be computed from HMMs / PMMs. The new framework is applicable to
both, simulation and single-molecule experimental data. We demonstrate its
versatility by applications to educative model systems, an 1 ms Anton MD
simulation of the BPTI protein, and an optical tweezer force probe trajectory
of an RNA hairpin
A Standard Protocol for the Calibration of Capillary Electrophoresis (CE) Equipment
Calibration of complex analytical systems is always a difficult task. Nevertheless, a suitable approach has to be designed before the systems can be introduced into routine analysis. In literature, many methods have been described for the purpose of calibrating such systems, but only a few of them deal with capillary elctrophoresis. Here, we want to demonstrate a general approach to how the calibration of this type of analytical instrument becomes feasible
Slow collective variables and molecular kinetics from short off-equilibrium simulations
Markov state models (MSMs) and master equation models are popular approaches
to approximate molecular kinetics, equilibria, metastable states, and reaction
coordinates in terms of a state space discretization usually obtained by
clustering. Recently, a powerful generalization of MSMs has been introduced,
the variational approach conformation dynamics/molecular kinetics (VAC) and
its special case the time-lagged independent component analysis (TICA), which
allow us to approximate slow collective variables and molecular kinetics by
linear combinations of smooth basis functions or order parameters. While it is
known how to estimate MSMs from trajectories whose starting points are not
sampled from an equilibrium ensemble, this has not yet been the case for TICA
and the VAC. Previous estimates from short trajectories have been strongly
biased and thus not variationally optimal. Here, we employ the Koopman
operator theory and the ideas from dynamic mode decomposition to extend the
VAC and TICA to non-equilibrium data. The main insight is that the VAC and
TICA provide a coefficient matrix that we call Koopman model, as it
approximates the underlying dynamical (Koopman) operator in conjunction with
the basis set used. This Koopman model can be used to compute a stationary
vector to reweight the data to equilibrium. From such a Koopman-reweighted
sample, equilibrium expectation values and variationally optimal reversible
Koopman models can be constructed even with short simulations. The Koopman
model can be used to propagate densities, and its eigenvalue decomposition
provides estimates of relaxation time scales and slow collective variables for
dimension reduction. Koopman models are generalizations of Markov state
models, TICA, and the linear VAC and allow molecular kinetics to be described
without a cluster discretization
DeepQMC: An open-source software suite for variational optimization of deep-learning molecular wave functions
Computing accurate yet efficient approximations to the solutions of the electronic Schrödinger equation has been a paramount challenge of computational chemistry for decades. Quantum Monte Carlo methods are a promising avenue of development as their core algorithm exhibits a number of favorable properties: it is highly parallel and scales favorably with the considered system size, with an accuracy that is limited only by the choice of the wave function Ansatz. The recently introduced machine-learned parametrizations of quantum Monte Carlo Ansätze rely on the efficiency of neural networks as universal function approximators to achieve state of the art accuracy on a variety of molecular systems. With interest in the field growing rapidly, there is a clear need for easy to use, modular, and extendable software libraries facilitating the development and adoption of this new class of methods. In this contribution, the DeepQMC program package is introduced, in an attempt to provide a common framework for future investigations by unifying many of the currently available deep-learning quantum Monte Carlo architectures. Furthermore, the manuscript provides a brief introduction to the methodology of variational quantum Monte Carlo in real space, highlights some technical challenges of optimizing neural network wave functions, and presents example black-box applications of the program package. We thereby intend to make this novel field accessible to a broader class of practitioners from both the quantum chemistry and the machine learning communities
Ensemble learning of coarse-grained molecular dynamics force fields with a kernel approach
Gradient-domain machine learning (GDML) is an accurate and efficient approach to learn a molecular potential and associated force field based on the kernel ridge regression algorithm. Here, we demonstrate its application to learn an effective coarse-grained (CG) model from all-atom simulation data in a sample efficient manner. The CG force field is learned by following the thermodynamic consistency principle, here by minimizing the error between the predicted CG force and the all-atom mean force in the CG coordinates. Solving this problem by GDML directly is impossible because coarse-graining requires averaging over many training data points, resulting in impractical memory requirements for storing the kernel matrices. In this work, we propose a data-efficient and memory-saving alternative. Using ensemble learning and stratified sampling, we propose a 2-layer training scheme that enables GDML to learn an effective CG model. We illustrate our method on a simple biomolecular system, alanine dipeptide, by reconstructing the free energy landscape of a CG variant of this molecule. Our novel GDML training scheme yields a smaller free energy error than neural networks when the training set is small, and a comparably high accuracy when the training set is sufficiently large
Machine learning implicit solvation for molecular dynamics
Accurate modeling of the solvent environment for biological molecules is crucial for computational biology and drug design. A popular approach to achieve long simulation time scales for large system sizes is to incorporate the effect of the solvent in a mean-field fashion with implicit solvent models. However, a challenge with existing implicit solvent models is that they often lack accuracy or certain physical properties compared to explicit solvent models as the many-body effects of the neglected solvent molecules are difficult to model as a mean field. Here, we leverage machine learning (ML) and multi-scale coarse graining (CG) in order to learn implicit solvent models that can approximate the energetic and thermodynamic properties of a given explicit solvent model with arbitrary accuracy, given enough training data. Following the previous ML–CG models CGnet and CGSchnet, we introduce ISSNet, a graph neural network, to model the implicit solvent potential of mean force. ISSNet can learn from explicit solvent simulation data and be readily applied to molecular dynamics simulations. We compare the solute conformational distributions under different solvation treatments for two peptide systems. The results indicate that ISSNet models can outperform widely used generalized Born and surface area models in reproducing the thermodynamics of small protein systems with respect to explicit solvent. The success of this novel method demonstrates the potential benefit of applying machine learning methods in accurate modeling of solvent effects for in silico research and biomedical applications
- …